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Abstract 

We generalize a result in [8] and derive an asymptotic formula for entropy rate of 
a hidden Markov chain around a "weak Black Hole". We also discuss applications of 
the asymptotic formula to the asymptotic behaviors of certain channels. 

Index Terms-entropy, entropy rate, hidden Markov chain, hidden Markov model, hidden 
Markov process 



1 Introduction 

(N 

Consider a discrete finite- valued stationary stochastic process Y = Y^ := {Y n : n E Z}. 
OO ; The entropy rate of Y is defined to be 



H{Y) = lim H{Y\)/{n+l)- 



here, H(Y® n ) denotes the joint entropy of Y® n := {K_ n , Y- n+ i, • ■ • , Y }, and log is taken to 
mean the natural logarithm. 

If Y is a Markov chain with alphabet {1,2,- •• ,B} and transition probability matrix 
A, it is well known that H(Y) can be explicitly expressed with the stationary vector of 
Y and A. A function Z = Z^^ of the Markov chain Y with the form Z = $(Y) is 
called a hidden Markov chain; here $ is a function defined on {1,2, • ■ • ,B}, taking values 
in A := {1,2, ■ • • ,A} (alternatively a hidden Markov chain is defined as a Markov chain 
observed in noise). For a hidden Markov chain, H(Z) turns out (see Equation (pQ)) to 
be the integral of a certain function defined on a simplex with respect to a measure due 
to Blackwell [4j. However Blackwell's measure is somewhat complicated and the integral 
formula appears to be difficult to evaluate in most cases. In general it is very difficult to 
compute H(Z); so far there is no simple and explicit formula for H(Z). 

Recently, the problem of computing the entropy rate of a hidden Markov chain Z has 
drawn much interest, and many approaches have been adopted to tackle this problem. For 



instance, Blackwell's measure has been used to bound the entropy rate [15] and a variation on 
the Birch bound [3] was introduced in [5]. An efficient Monte Carlo method for computing 
the entropy rate of a hidden Markov chain was proposed independently by Arnold and 
Loeliger pQ, Pfister et. al. p2], and Sharma and Singh [19J. The connection between the 
entropy rate of a hidden Markov chain and the top Lyapunov exponent of a random matrix 
product has been observed (101 [TTJ [121 E]- In [7], it is shown that under mild positivity 
assumptions the entropy rate of a hidden Markov chain varies analytically as a function of 
the underlying Markov chain parameters. 

Another recent approach is based on computing the coefficients of an asymptotic expan- 
sion of the entropy rate around certain values of the Markov and channel parameters. The 
first result along these lines was presented in [12] , where for a binary symmetric channel with 
crossover probability e (denoted by BSC(e)), the Taylor expansion of H(Z) around e = 
is studied for a binary hidden Markov chain of order one. In particular, the first derivative 
of H(Z) at e = is expressed very compactly as a Kullback-Liebler divergence between 
two distributions on binary triplets, derived from the marginal of the input process X. Fur- 
ther improvements and new methods for the asymptotic expansion approach were obtained 
in [16], [20], [21] and [8]. In [16] the authors express the entropy rate for a binary hidden 
Markov chain where one of the transition probabilities is equal to zero as an asymptotic 
expansion including a 0(e\oge) term. 

This paper is organized as follows. In Section [2] we give an asymptotic formula (The- 
orem 12. 8p for the entropy rate of a hidden Markov chain around a weak Black Hole. The 
coefficients in the formula can be computed in principle (although explicit computations may 
be quite complicated in general). The formula can be viewed as a generalization of the Black 
Hole condition considered in [8]. The weak Black Hole case is important for hidden Markov 
chains obtained as output processes of noisy channels, corresponding to input processes, for 
which certain sequences have probability zero. Examples are given in Section [31 Example 13.11 
was already treated in [9] for only the first few coefficients; but in this case, these coefficients 
were computed quite explicitly. 

2 Asymptotic Formula for Entropy Rate 

Let W be the simplex, comprising the vectors 

{w = (w%, w 2 , ■ ■ ■ , w B ) e R B : Wi > 0, 2J Wi = 1}, 

i 

and let W a be all w e W with Wi = for $(2) ^ a. For a 6 A, let A a denote the B x B 
matrix such that A a (i,j) = A(i,j) for j with $(j) = a, and A a (i,j) = otherwise. For 
a G A, define the scalar-valued and vector-valued functions r a and f a on W by 

r a (w) = wA a l, 

and 

f a (w) = wA a /r a (w). 
Note that defines the action of the matrix A a on the simplex W. 
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If Y is irreducible, it turns out that 

H{Z) = - I J2 r a(w)logr a (w)dQ(w), (1) 

where Q is Blackwell's measure [4 J on W. This measure, which satisfies an integral equation 
dependent on the parameters of the process, is however very hard to extract from the equation 
in any explicit way. 

Definition 2.1. (see jH]) Suppose that for every a £ A, A a is a rank one matrix, and every 
column of A a is either strictly positive or all zeros. We call this the Black Hole case. 

It was shown [8] that H(Z) is analytic around a Black Hole and the derivatives of H(Z) 
can be exactly computed around a Black Hole. In this sequel, we consider weakened assump- 
tions and prove an asymptotic formula for entropy rate of a hidden Markov chain around a 
"weak Black Hole" , generalizing the corresponding result in [8] . 

Definition 2.2. Suppose that for every a £ A, A a is either an all zero matrix or a rank one 
matrix. We call this the weak Black Hole case. 

We use the standard notation: by a = 0(/3), we mean there exist positive constants 
Ci,C2 such that C\\j3\ < \a\ < C 2 , while by a = 0(@), we mean there exists a positive 
constant C such that |a| < C\f3\. For a given analytic function f(e) around e = 0, let 
ord(/(e)) denote its order, i.e., the degree of the first non-zero term of its Taylor series 
expansion around e = 0. Note that for an analytic function f(e) around e = 0, 

f(e) = &(e k )^oTd(f(e)) = k. 

We say A(e) is normally parameterized by e (e > 0) if 

1. each entry of A(e) is an analytic function at e — 0, 

2. when e > 0, A(e) is (non-negative and) irreducible, 

3. A(0) is a weak black hole. 

In the following, expressions like px{x) will be used to mean P(X = x) and we drop the 
subscripts if the context is clear: p(x),p(z) mean P(X = x),P(Z = z), respectively, and 
further p(y\x),p(zo\zZn) mean P(Y = y\X = x),P(Z = zo\ZZn = zZ n ), respectively. 

Proposition 2.3. Suppose that A(e) is analytically parameterized by e > and when e > 0, 
A(e) is non-negative and irreducible. Then for any fixed hidden Markov sequence z°_ n £ 
A n+1 , 

1. p(zZn) is analytic around e = 0; 

2. p(yi = ■ \z % _ n ) := (p(yi = b\zZ n ) '■ b — 1, 2, • • • , B) is analytic around e = 0, where ■ 
denotes B possible states of Markov chain Y , 

3. p(zo\zZn) is analytic around e = 0. 
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Proof. 1. When e > 0, A(e) is non-negative and irreducible. By Perron- Frobenius the- 
ory [18], A(e) has a unique positive stationary vector, say ?r(e). Since 

adj (I - A(e))(I - A(e)) = det(I - A(e))J = 

(here adj (•) denotes the adjugate operator on matrices), one can choose ir(e) to be any 
normalized row vector of adj (J — A(e)). So ir(s) can be written as 

(7ri(e),7r 2 (e), • • • ,vr B (e)) 



7Ti(e) + 7r 2 (e) H h 7T B (e)' 



where 7Tj(e)'s are non-negative analytic functions of e and the first non-zero term of every 
7Tj(e:)'s Taylor series expansion has a positive coefficient. Then we conclude that for each i 

ord (rci(e)) > ord (vr^e) H h 7r s (e)), 

and thus 7r(e), which is uniquely defined on e > 0, can be continuously extended to e = 
via setting 7r(0) = lim £ „> 7r ( £ )- 
Now 

P(*4) = *{ £ )^z-n ■ • • A^l = — — — — — =: — -, 2 

7ri(e) + tt 2 (£:) H hvrij(£:) flf(e) 

here ord (/(e)) > ord ((7(e)). It then follows that p(^Z n ) is analytic around e = 0. 

2. Let Xj_ n = Xi- n (z l _ n ) denote p{yi = ■ \z l _ n ). Then one checks that Xi _„ satisfies the 
following iteration: 

Zi,-n = t i") -n < z < -1, (3) 

X%— 1,— n^Zj A 

starting with x_ n _i 5 _ n = p(y_ n _i = • ). Because A is analytically parameterized by e (e > 0) 
and A(e) is non-negative and irreducible when e > 0, inductively we can prove (the proof is 
similar to the proof of 1.) that for any i, Xi- n can be written as follows: 

(/i(e),/a(e), ••■,/*(£)) 

— ? 



A(£) + /2(£) + --- + M£r 

where /j(e)'s are analytic functions around e = 0. Note that for each % 

ord (/<(e)) > ord (/i(e) + / 2 (e) + • • • + / B (e)). 

The existence of the Taylor series expansion of Xj _ n around e = (for any i) then follows. 
5. One checks that 

P(Zo\ z -n) = X -l ( 4 ) 

Analyticity of p(z \zZ n ) immediately follows from (T4J) and analyticity of x^i_ n around e = 0, 
which has been shown in 2.. 

□ 

Lemma 2.4. Consider two formal series expansion f(x),g(x) G R[[#]] such that f(x) = 
Yl^Lofi^ an d d( x ) = Y^o9i x% ^ where g 7^ 0. Let h(x) G &e i/ie quotient of f(x) 

and g(x) with h(x) = YH^o^iX 1 . Then hi is a function dependent only on /o, • • • , fi and 
go,-- - ,9i- 
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Proof. Comparing the coefficients of all the terms in the following identity: 

' oo \ / oo \ oo 



i=0 / \j=0 / i=0 



we obtain that for any i, 

h gi + frift-i H h hig = f { . 

The lemma then follows from an induction (on i) argument. 

□ 

By Proposition 12.31 for any hidden Markov string z°_ m , the Taylor series expansion of 
p(zo\zZ m ) around e = exists. We use bj(z°_ m ) to represent the coefficient of in the 
expansion, namely 

PizoKl) = &o(*° J + &i(z°Je + b 2 (z°_ m )s 2 + ■■■ . (5) 

The following lemma shows that under certain conditions, some coefficients bj(z°_ m ) "sta- 
bilize" . More precisely, we have: 

Lemma 2.5. Consider a hidden Markov chain Z with normally parameterized A(e). For 
two fixed hidden Markov chain sequences z°_ m , z°_ A such that 



■" ord^l^ 1 )), ord^irr 1 ))^^ 

for some n < m,rh and some k, we have for j with < j < n — 4k — 1, 

b J (z°_ m ) = b J (z°_J. 

Proof. Recall that re* _ m = x i( _ m (zL m ) = p(y* = • \zt_ m ) and x t _^ = ^-^(i^) = p(yt = 
■ | zLjft), where • denotes the possible states of Markov chain K. Consider the Taylor series 
expansion of Xi_ m , x,,-^, around £ = 0, 

Xi- m = a (z!_ m ) + ai(z l _ m )e + a 2 (z l _ m )e 2 H (6) 

Xi-rh = «o(^m) + Ol(^-rh) £ + ^O^m)^ H (7) 

We shall show that aj(z l _ m ) = aj{z l _^) for j with 

i 

0<] <n + l-J2 maX {^-mM(^m)} ; 
l=—n 



-mi 



where for any hidden Markov sequence z l _ 

1 + ord {p{z i \zt'm)) if ord {p{zi\zt^)) > 
if ordOOil^)) = 



J(zt 
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Recall that 
Now with ([6]) and ([7]), we have 

oo OO . (fc) , s oo 

x,_ m A 2i+1 (e) = E«i(*-J E -^P 6 " = E c ^--) £ '' ( 9 ) 
i=o fc=o ' ' i=o 

where superscript w denotes the fc-th order derivative with respect to e. 
We proceed by induction on % (from — n to —1). 

First consider the case when i = —n. When max{ J(zZm ), J{zZ%)} > 0, the statement 
is vacuously true; when J(zZm) = J{zZ%) = 0, necessarily A z _ n (0) is a rank one matrix, 
a o (zZSr 1 )A z _ n (0)l > and a o (zZSr 1 )A z _ n (0)l > 0. Then we have 



where (*) follows from the fact that A^_ n (0) is a rank one matrix. 

Now suppose i > —n and that aj(z l _ m ) = ctj(z l _rh) for j with 0<j<n + i — 

E/=-n max { J (^-m)> J(Z-m)}- 

If ord (p(^i+i|^l. m )) > 0, since the leading coefficient vector of the Taylor series expansion 
in (jHD is non-negative, Cj(z 1 ^) = for all j with < j < J{zZ^) — 2 and Cj^i+is^z^) ^ 0. 
So applying Lemma 12.41 to the following expression 

cp(^) + ci(^)g + • • • + ctjz^e 1 + ■■■ EZo c i+j( z ^)-i( z -^y 
Xl+1 '~ m c (^)l + Cl (z!^)le + • ■ ■ + cj(a£E)le» + • ■ ■ E^o^j^-iC^)^' 

(10) 

we conclude that for all j, a^z!^) depends only on 

q(^), J(zt£) - 1 < I < J(^l) ~ 1 + 3, 

implying that a^z!^) depends only on (or some of) 

ai (zt m ), Ag^(0), < I < J(zt^) -l+j. 

A completely parallel argument also applies to the case when ord (p(zj + i|zl A )) > 0. More 
specifically, the statements above for the case ord (p(zi+i\zL m )) > are still true if we replace 
z,x,m with z,x,rh, which implies that a.,-(£^J depends only on (or some of) 

ai(zUl Ag^O), 0<l< J(2%)-l+j. 
Thus when max{J(z!^), J(iT^)} > 0, we have a,(z!j^) = for j with 

i 

< j < n + i - E max{ J(^ m ), J(£ J} - max{ J(z^) - 1, J(z^) - 1} 
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+ (* + !) " maX { Atj}- 



i+l 

II ' ' ' 

l=—n 

If ord (p(zi+i\z l _ m )) = 0, by (jlj) necessarily we have 



00(0^(0)1^0. 
Again by Lemma EH applied to expression (flOl) . for any j, aj(z^ m ) depends only on 

ai (zt m ), A« +i (0), 0<1<J, 
Similarly if ord (p(zi + i\z l _ m )) = 0, we deduce that for any j, dj(z 1 ^) depends only on 

ai (ziJ, A« +i (0), 0<l<j. 
Thus if max{ J(z!l^), </(z_^)} = 0, for any j with 

i i+l 

< j <n + i - max{J(2:L m ), J{z l _^)} = n + z - ^ max{J(zi. m ), J(£j}, 



l=—n (=— n 

i+l 



we have a,j(z_ m ) = aj(z_ ni 



Now, let t = n + (i + 1) — Yl\=- n max {^( z -m)? ^(^-m)}- Then one can show that 

where the first term in the expression above is equal to (since A 2i+1 (0) is a rank one 
matrix), and the "other terms" are functions of 

a (zt m ),... ,0^(0, Ag^O),..- ,AW +i (0). (11) 

It follows that function of the same quantities in (11 II) . By a completely parallel 

argument as above, ^(z 1 ^) is the same function of of the same quantities in (FiTi) . So we 
have OLj{z % ^) = o 3 -(i!^) for j with 

i+l 



< j < n + (i + 1) - max{ J{z l _J, J{zlrn)}. 



l=—n 

Notice that 



max{ J(z l _J, J(£j} < ( J ( z -m) + J(Z-rh)) < 



l=—n l=—n 



The lemma then immediately follows from (jlj) and the proven fact that aj(z_ m ) = aj{z_^ 
for j with 



-l 

< j < n - 1 - max{ J(^ m ), J(^ rf J}. 

(=— n 

□ 
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For a mapping v = v(e) : [0, oo) — > W analytic at e = and a hidden Markov sequence 
z°_ n , define 

Vv(zZ l n ) = vA z _ n ---A z _ 1 l, and p v (z \zZl) = Vv ^~_ n l - 

Pv{z_ n ) 

Let b v j(z°_ n ) denote the coefficient of in the Taylor series expansion of p v (zo\zZn) (note 
that b v j(z°_ n ) does not depend on e), 

oo 

p v (z \zZl) = ^bvAziJe 3 . 

3=0 

Using the same inductive approach in Lemma 12.51 we can prove that 

Lemma 2.6. For two mappings v = v(e),v = v(e) : [0, oo) — > W analytic at e = 0, if 
ord {Pv{z~n))i or d (Pv( z - n )) — k> we then have 

b v , j {z-n) = hA*-n), 0<j<n-4fc-l. 

Note that for n < m,rh, if v(e) (or v(e)) is equal to p(y n -i = -{zZ^Z ) (or p(y n -i = 
■\zZ^Z 1 )), thenp„(z° n ) (or Pv(z°_ n )) will be equal to pizZjzZ^ 1 ) (or p{zZ n \z^l\ 1 ))] and if for 
a Markov state y, v(e) (or v(e)) is equal to p(y n -i = 4 \zZ^Z v) ( or p(Vn-i — •\zZ^Z 1 y)), then 
Pv{ z - n ) ( or Pv( z - n )) w * n De equal to p(z°_ n \zZ%Z y) (or J-sCX" 1 ?/)). It then immediately 
follows that 

Corollary 2.7. Given fixed sequences z°_ m , z°_rh, z°_ m y- m -i, zZ^y-m-i with z°_ n = z°_ n such 
that 

ord {pizZllzZ^ 1 )), ord (pOT^C^ -1 )), ord (p(>l*|zl™~ Vm-i)), ord (p(^|zl™~ Vm-i)) < fc, 
/or n < m,rh and some k, we have for j with < j < n — 4k — 1, 

bj(zZ m y- m -i) = bjizZ^y^-i) = = 6,(5° J, (12) 

where slightly abusing the notation, we define b j(z°_ m y '-m-i) > bj^zZ^y-m-i) as the coefficients 
of the Taylor series expansions of p(z Q \z°_ m y-m-i) , p{zo\zZ.f h y-rh-i) , respectively. 

Consider expression (j5J). In the following, we use p <l> (z \zZn) to denote the truncated 
(up to the (I + l)-st term) Taylor series expansion of p(zo\zZn), i.e., 

p^izolzZl) = b (z°_ n ) + hizzje + b 2 (zz n )e 2 + ■■■ + b^e 1 . 

Theorem 2.8. For a hidden Markov chain Z with normally parameterized A(e), we have 
for any k > 0, 

k+l k 

H(Z) = H(Z) | £=0 + fj £j lo & £ + E + 0(e h+1 ), (13) 

3=1 3=1 

where fj 's and gj 's for j = 1, 2, • • • , k + 1 are functions (more specifically, elementary func- 
tions built from log and polynomials) o/A^(0) for < i < 6k + 6 and can be computed from 
H 6k+e (Z(e)). 
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Proof. First fix n such that n > n$ = 6k + 6. Consider the Birch upper bound on H(Z) 
H n (Z) := H{Z,\ZZI) = -Y,P{^-n)^gp{z Q \zZ l n ). 

Note that for j > k + 2, 

E P(^Jlogp(^o|0 =0(e k+1 ). (14) 

ord (p(*°„))=j 

So, in the following we only consider the sequences z°_ n with ord (p(z°_ n )) < k + 1. For such 
sequences, since ord (p(zo\zZ n )) < ord (p(z°_ n )) < k + 1, we have 

| logp(zol^) - logp< 2fc+1 >(z |^)| = 0(^ +1 ); (15) 
and by Lemma [2.51 we have 

P <^>(z \zZ 1 n )=p< 2k+1 >(z \zZl l0 ). (16) 
Now for any fixed n> n Q , 

H n (Z) = E-^-JlogpN^) 

*— n 



E -P(^ jlogp(^o|^) + 0( £ A 

ord (p(z°J)<fc+l 



(6) 



E -P(^-J logp <2fc+1> (z |^) + 0(e fc+1 ) 

ord (p(z°J)<fc+l 

= E "MO logP <2fc+1> NO + 0(e k+1 ) 

ord(p( Z O no ))<fc+l 

E -P^\)^gp <2k+1> {z \zZ l J + O{e k+ \ (17) 

ord(p( Z O n() ))< A;+ l 

where (a) follows from (fill) ; (b) follows from (fT5l) ; (c) follows from (fl~6l) . ffl~4j) and the fact 
that 

{z _ n :ord(p(z°_J)<k + l} 
= {zZ n : ord (p(z°_ n )) <k + l}U {z°_ n : ord (p(zZj) <k + l, ord (p(*°J) > k + 2}. 
Expanding ffT7|) . we obtain: 

fc+i 

H n (Z) = H(Z)\ £=0 + fj£ j ^ge + E^' + 0{e k+1 ), 

3=1 3=1 

where fj's and g^s for j = 1,2, ••• ,k + 1 are functions dependent only on A^(0) for 
< i < no and can be computed from H no (Z) (in fact for fixed j, fj and g 3 - are functions 
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dependent only on A^(0) for < i < 6j + 6 and can be computed from if 6 j +6 (Z)). In 
particular, 

E E -P(^no) lo gP <2fc+1> (^l^ no ) (18) 

ord (p(z°_ n ))<k+l ord {p{z \zZl ))=0 

will contribute to H(Z)\ e=Q and the terms e\ and 

E E ^(^no) 1 ^^ 2 ^ 1 ^^!^) (19) 

ord (p(z _J)<k+l ord (j>(z \zZi ))>0 

will contribute to the terms log e and the terms e J . 

Using Corollary 12.71 one can apply similar argument as above to the Birch lower bound 

H n (Z) := HiZolZZ^Y-n-i) = E -P^-nV-n-i) \ogp(z \zZ l n y- n -i). 

z°_ n ,y- n -i 

For the same uq, one can show that H n (Z) takes the same form (TlTl) as H n (Z), which implies 
that H n (Z) and H n (Z) have exactly the same coefficients of e J for j < k and of logs for 
j < k + 1 when n > n . We thus prove the theorem. □ 

Remark 2.9. Theorem 12.81 still holds if we assume each entry of A(e) is merely a C k+1 
function of e in a neighborhood of e = 0: the proof still works if "analytic" is replaced by 
u C k+lv , and the Taylor series expansions are replaced by Taylor polynomials with remainder. 
We assumed analyticity of the parametrization only for simplicity. 

Remark 2.10. Note that at a Black Hole, we have ord (p(z |-2- n )) = ^ or an y hidden 
Markov symbol sequence z°_ n . Thus, from the discussion surrounding expressions ffTS]) and 
( TTOj) above, we see that fj = for all j. By the proof of Theorem 12.81 Formula (fT5|) is a 
Taylor polynomial with remainder; this is consistent with the Taylor series formula for a 
Black Hole in [8]. 

Remark 2.11. The proof of Theorem 12.81 shows that for n > no, H n (Z), H n (Z) take the 
same form as in (Tl3|) with the same coefficients. 

3 Applications to Finite-State Memoryless Channels 
at High Signal-to-Noise Ratio 

Consider a finite-state memoryless channel with stationary input process. Here, C = {C n } 
is an i.i.d. channel state process over finite alphabet C with pc(c) = q c for c G C, X = {X n } 
is a stationary input process, independent of C, over finite alphabet X and Z = {Z n } is 
the resulting (stationary) output process over finite alphabet Z. Let p(z n \x n ,c n ) = P(Z n = 
z n \X n = x n ,C n = c n ) denote the probability that at time n, the channel output symbol is 
z n given that the channel state is c n and the channel input is x n . The mutual information 
for such a channel is: 

I(X,Z):=H(Z)-H(Z\X)®H(Z)- ^ p{x, z) log p(z\x), 

xex,zez 



10 



where (*) follows from the memory less property of the channel, and for iG^zeZ, 

p(x, z) = y]p(z\x, c)p(x)p(c), p{z\x) = 2_^p(z\x,c)p(c). 

cec cec 

Now we introduce an alternative framework, using the concept of channel noise. As 
above, let C be an i.i.d. channel state process, and let X be a stationary input process, 
independent of C, over finite alphabets C, X. Let £ (resp., Z) be finite alphabets of abstract 
error events (resp. output symbols) and let $ : X x C x £ — > Z be a function. For each 
x G X and c G C, let p(-\x, c) be a conditional probability distribution on £. This defines a 
jointly distributed stationary process (X, C, E) over X x C x £ . If X is a first order Markov 
chain with transition probability matrix II, then (X, C, E) is a Markov chain with transition 
probability matrix A, defined by 

\x,c, e ),( y ,d,f) = • q d ■ p(f\y, d) 

and $, A define a hidden Markov chain, denoted Z(A, $). 

We claim that the output process Z, described in the first paragraph of this section, fits 
into this alternative framework (when X is a first order Markov chain). To see this, let 
£ = X x C x Z, and define p(e = (x,c, z)\x' ,c') = p(z\x,c) if x — x' and c = c', and 
otherwise. Define $(x',c', (x,c,z)) = z. Then, Z = Z(A, $) is a hidden Markov chain. So, 
from hereon we adopt the alternative framework. 

Now, we assume that X is an irreducible first order Markov chain and that the channel 
is parameterized by e such that for each x, c, and e, p(e\x,c)(e) are analytic functions of 
e > 0. For each e > 0, let A(e) denote the corresponding transition probability matrix on 
state set X x C x £ and {Z(e)} denote the family of resulting output hidden Markov chains. 
We also assume that there is a one-to-one function from X into Z, z = z(x), such that for 
all c, p(z(x)\x,c)(0) = 1. In other words, e behaves like a "composite index" indicating how 
good the channel is, and small e corresponds to the high signal-to-noise ratio. Then one 
can verify that A(0) is a weak black hole and A(e) is normally parameterized. Thus, by 
Theorem I2.8[ we obtain an asymptotic formula for H(Z(e)) around e — 0. We remark that 
the above naturally generalizes to the case where X is a higher order irreducible Markov 
chain (through appropriately grouping matrices into blocks). 

In the remainder of this section, we give three examples to illustrate the idea. 

Example 3.1. [Binary Markov Chains Corrupted by BSC(e)] 

Consider a binary symmetric channel with crossover probability e. At time n the channel 
can be characterized by the following equation 

Z n = X n © E n , 

where {X n } denotes the input process, © denotes binary addition, {E n } denotes the i.i.d. 
binary noise withp^O) = 1—e andpe(l) = e, and {Z n } denotes the corrupted output. Note 
that this channel only has one channel state, and at e — 0, p#|jf(l|l) = l,pz|x(0|0) = 1, so 
it fits in the alternative framework described in the beginning of Section [3j 

Indeed, suppose X is a first order irreducible Markov chain with the transition probability 
matrix 

n _ TTOO TTOl 
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Then Y = {Y n } = {(X n , E n )} is jointly Markov with transition probability matrix (the 
column and row indices of the following matrix are ordered alphabetically): 



A 



TToo(l-e) ^OO^ 7Toi(l-e) 7loi£ 

TTOo(l-e) ^OO^ 7Toi(l-e) TT iE 

7Tio(l-e) 7Ti £: TTii(l-e) 7r u e 

7Tio(l-e) TT 10 8 7Tn(l-e) ir U £ 



and Z = <&(Y) is a hidden Markov chain with $(0,0) 
When e = 0, 



$(1,1) =0, $(0,1) = $(1,0) = 1. 



TTOO 7T i 

TTOO 7T i 

VTlO TTn 

VTlO TTn 



Ao 



% 

% o 

TTio 

TTio 



Ai 



f 01 

f 01 

7Tu 

7Tu 



thus both Ao and Ai have rank one. If 7ry's are all positive, then we have a Black Hole case, 
for which one can derive the Taylor series expansion of H(Z) around e = [201 E]' if 7i"oo 
or 7rii ar e zero, then this is a weak Black hole case with normal parameterization (of e), for 
which Theorem 12.81 can be applied and an asymptotic formula for H(Z) around e = can 
be derived. 

For a first order Markov chain X with the following transition probability matrix 

1 — p p 
1 

where < p < 1, it has been shown [16] that 



H(Z) = H{X) - P^Pl £ lo g£ + 0(e) 
1 + p 

as e —>■ 0. This result has been further generalized 0, US] to the following formula: 
H(Z) = H(X) + /(X)elog(l/e) + ^(X)e + 0(e 2 loge), 



(20) 



where X is the input Markov chain of any order m with transition probabilities P(X t 



aaXl 



-i 
-m 



- 1 ) 

-mli 



m £ X m , where X = {0, 1}, Z is the output process obtained by passing 
X through a BSC(e), and f(X) and g{X) can be explicitly computed. Theorem 12.81 can 
be used to generalize (120]) to a formula with higher asymptotic terms. In particular, when 



P(X t = a \Xl 



-l 

m 



- 1 ) 
-ml 



> for a° G X m+1 , we have a Black Hole, in which case, the 



Taylor series expansions of H(Z) around e = can be explicitly computed (in principle); 



when P(X t = ao\X } 



i-l 
t—m 



for some o°_ m G X m+l , we have a weak Black Hole, in 



which case an asymptotic formula of H(Z) around e = can be obtained. 

Example 3.2. [Binary Markov Chains Corrupted by BEC(e)] 

Consider a binary erasure channel with fixed erasure rate e (denoted by BEC(e)). At 
time n the channel can be characterized by the following equation 



X n if E n 
e if E„ 




1 
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where {X n } denotes the input process, e denotes the erasure, {E n } denotes the i.i.d. binary 
noise with p#(0) = 1 — £ and = e, and {Z n } denotes the corrupted output. Again this 

channel only has one channel state, and at e — 0, pz|x(l|l) = l,Pz|x(0|0) = 1, so it fits in 
the alternative framework described in the beginning of Section [3j 

If the input X is a first order irreducible Markov chain with transition probability matrix 



n 



7T00 7T l 



and let Z denote the output process. Then Y = (X, E) is jointly Markov with (the column 
and row indices of the following matrix are ordered alphabetically) 



A 



7Too(l-£) TT 00 E 7T l(l-£) 7T 01 £: 

7TOo(l-£) TT 00 E 7T l(l-£) 7T 01 £: 

7T 10 (l-£) 7Ti £: 7Tu(l-£) 7T U £ 

7Ti (l-£) 7Ti £: 7Tii(l-e) ir u e 



and Z = is hidden Markov with $(0, 1) 

Now one checks that 



$(1, 1) = e, $(0, 0) = and $(1, 0) = 1. 



A 



Troo(l-e) 

Troo(l-e) 

7Ti (l-e) 

7Ti (l-e) 



7r i(l-e) 

7T i(l-e) 

7rn(l-e) 

7rn(l-e) 



A f 



7r 00 £: 7T i£: 

7r 00 £: 7T i£: 

7Ti £: 7Tn£ 

1Y W E 7Txi£ 



One checks that A(e) is normally parameterized by e and thus Theorem 12.81 can be applied. 
Furthermore, Theorem 12.81 can be applied to the case when the input is an m-th order 
irreducible Markov chain X to obtain asymptotic formula for H(Z) around e = 0. 

Example 3.3. [Binary Markov Chains Corrupted by Special Gilbert-Elliot Channel] 

Consider a binary Gilbert-Elliot channel, whose channel state (denoted by C = {C n }) 
varies as an i.i.d. binary stochastic process with pc(0) = qo,pc(l) = <?i (here the channel 
state varies as an i.i.d. process, rather than a generic Markov process). At time n the channel 
can be characterized by the following equation 

Z n = X n © E n , 

where {X n } denotes the input process, © denotes binary addition, {E n } denotes the i.i.d. 
binary noise with p E \c(0\0) = 1 - e , p B | O (0|l) = 1 - e x , Pe\cW) = £ o, Pfi|c(l|l) = £i and 
{Z n } denotes the corrupted output. For such a channel, Pz\(x,c)(l\l, c ) — l,Pz|(x,c)(0|0, c) = 
1 at e = for any channel state c. So it fits in the alternative framework described in the 
beginning of Section [31 

To see this in more detail, we consider the special case when the input X is a first order 
irreducible Markov chain with transition probability matrix 



n 



T^OO ^Ol 
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and let Z denote the output process. Then Y = (X, C, E) is jointly Markov with (the column 
and row indices of the following matrix are ordered alphabetically) 



1-0090(1 

7T00<?0(1 

i~oo9o(l 
too<jo(1 

7T1090(1 
T1090(l 
Tl09o(l 
T10<?0(1 



■ Eo) 

so) 
so) 

- eo) 
so) 
eo) 

- eo) 

■ eo) 



?roo<?oeo 

T0090E0 

i~oo9oeo 
i~oo9oeo 
i~io9o£o 
i~io9oeo 
i~io9o£o 
i"io<?oeo 



T009l(l 
7T009l(l 
7T009l(l 
T009l(l 
Tl09l(l 
T109l(l 
Tl09l(l 
T109l(l 



El) 
El) 
El) 
El) 
El) 
El) 
El) 
El) 



7T0091E1 
7T0091E1 

fooijiei 
T009iei 
■Tio9iei 

7T1091E1 
7T1091E1 
T1091E1 



i"Oi9o(l - eo) 
ttoi9o(1 - eo) 
ttoi9o(1 - eo) 
toi9o(1 - eo) 
7Tii9o(l - eo) 
th9o(1 - eo) 
i"n9o(l - eo) 
7rii<jo(l - eo) 



?roi9oeo 
7r oi9oeo 
i"oi9oeo 
TOi9oeo 
7rngo£o 
Tii9oeo 

T1190E0 

7rn9oeo 



7T019l(l 

7roigi(l 

7T019l(l 
7T019l(l 
7Tn<Jl(l 

7rngi(l 

7Tll9l(l 
7TH9l(l 



El) 
El) 
El) 
El) 
El) 
El) 
El) 
El) 



7i"0i9iei 

"""0191E1 
7T0191E1 
T0191E1 

7rn9iei 
7rn9iei 
7i"ii9i£i 
T1191E1 



and Z = $(X, C, E) is hidden Markov with 







$(0,0,0) 


= $(0,1,0) = 


= $(1,0,1) = 


$( 


1,1 


1) 


= 0, 












$(0,0,1) 


= $(0,1,1) = 


= $(1,0,0) = 


$( 


1,1,0) 


= 1. 








some positive k, let e 


= £,£1 


= ke. 


If e = 0, 


one 


checks that 










TTOOgO 





V"00<?1 








" 




" 











Troigo 





7T01<?1 







^00% 





7T00<?1 

























7T01<70 





vroigi 







^00% 





7I"00<?1 






























TTOI^I 












TTOO^l 











,A 1 = 














7T01<?0 





voigi 







7TlO<?0 





TTlO^l 






























7Tll<?l 












Triogi 





































TTlO^O 





7T10<?1 





































^10% 





V"lO<?l 






























Tragi 






So, both A and A x will be rank one matrices and one can check that A(e) is normally 
parameterized by e. Again, Theorem 12.81 can be applied to the case when the input is an 
m-th order irreducible Markov chain X to obtain an asymptotic formula for H(Z) around 
e = 0. 
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